MAP speaker adaptation of state duration distributions for speech recognition
نویسندگان
چکیده
This paper presents a framework for maximum a posteriori (MAP) speaker adaptation of state duration distributions in hidden Markov models (HMM). Four key issues of MAP estimation, namely analysis and modeling of state duration distributions, the choice of prior distribution, the specification of the parameters of the prior density and the evaluation of the MAP estimates, are tackled. Moreover, a comparison with an adaptation procedure based on maximum likelihood (ML) estimation is presented, and the problem of truncation of the state duration distribution is addressed from the statistical point of view. The results shown in this paper suggest that the speaker adaptation of temporal restrictions substantially improves the accuracy of speakerindependent (SI) HMM with clean and noisy speech. The method requires a low computational load and a small number of adapting utterances, and can be useful to follow the dynamics of the speaking rate in speech recognition.
منابع مشابه
Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker adaptation of output probabilities and state duration distributions for speech recognition
This paper presents a comparison of maximum a posteriori (MAP) speaker adaptation of state duration distributions and output probabilities in HMM. Both adaptation procedures are compared and then combined in recognition experiments with clean and noisy signals. The results here shown suggest that the state duration distribution adaptation can lead to higher improvements than the adaptation of o...
متن کاملTesting the Hypothesis of Multivariate Normality in Bayesian Approaches to Speaker Adaptation
Bayesian approaches to speaker adaptation are popular in Automatic Speech Recognition (ASR) systems. In most kinds of Bayesian adaptation, there are parameters whose prior distributions are assumed to be multivariate normal. This paper presents a methodology, which can test the hypothesis of multivariate normality. When applied to Maximum A Posterior (MAP) adaptation, we found that the real pri...
متن کاملAverage-Voice-Based Speech Synthesis
This thesis describes a novel speech synthesis framework " Average-Voice-based Speech Synthesis. " By using the speech synthesis framework, synthetic speech of arbitrary target speakers can be obtained robustly and steadily even if speech samples available for the target speaker are very small. This speech synthesis framework consists of speaker normalization algorithm for the parameter cluster...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Trans. Speech and Audio Processing
دوره 10 شماره
صفحات -
تاریخ انتشار 2002